Introducing Softness into Inductive Queries on String Databases

نویسندگان

  • Ieva Mitasiunaite
  • Jean-François Boulicaut
چکیده

In many application domains (e.g., WWW mining, molecular biology), large string datasets are available and yet under-exploited. The inductive database framework assumes that both such datasets and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the core research topics for data mining. Indeed, constraint-based mining techniques on string datasets have been studied extensively. Efficient algorithms enable to compute complete collections of patterns (e.g., substrings) which satisfy conjunctions of monotonic and/or anti-monotonic constraints in large datasets (e.g., conjunctions of minimal and maximal support constraints). We consider that fault-tolerance and softness are extremely important issues for tackling real-life data analysis. We address some of the open problems when evaluating soft-support constraint which implies the computations of pattern soft-occurrences instead of the classical exact matching ones. Solving efficiently soft-support constraints is challenging since it prevents from the clever use of monotonicity properties. We describe our proposal and we provide an experimental validation on real-life clickstream data which confirms the added value of this approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Algebra for Inductive Query Evaluation

Inductive queries are queries that generate pattern sets. This paper studies properties of boolean inductive queries, i.e. queries that are boolean expressions over monotonic and anti-monotonic constraints. More specifically, we introduce and study algebraic operations on the answer sets of such queries and show how these can be used for constructing and optimizing query plans. Special attentio...

متن کامل

Cognitive Database: A Step towards Endowing Relational Databases with Artificial Intelligence Capabilities

We propose Cognitive Databases, an approach for transparently enabling Artificial Intelligence (AI) capabilities in relational databases. A novel aspect of our design is to first view the structured data source as meaningful unstructured text, and then use the text to build an unsupervised neural network model using a Natural Language Processing (NLP) technique called word embedding. This model...

متن کامل

Mitašiūnaitė Mining String Data under Similarity and Soft - Frequency Constraints : Application to Promoter Sequence Analysis

An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of p...

متن کامل

Partially Ordered Regular Languages for Graph Queries

In this paper we present an extension of regular languages to support graph queries. The proposed extension is based on the introduction of a partial order on the strings of the languages. We extend regular expressions and regular grammars by introducing partial orders on strings and production rules, respectively. The relations between regular expressions and regular grammars are analyzed. We ...

متن کامل

SPADA: A Spatial Association Discovery System*

This paper presents a spatial association discovery system, named SPADA, which has been developed according to the theoretical framework of inductive databases. Our approach considers inductive databases as deductive databases with an integrated inductive component and relies on techniques borrowed from the field of Inductive Logic Programming (ILP). In SPADA, an ILP module supports the process...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006